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Abstract: Many statistical models are algebraic in that they are defined in terms of polynomial 
constraints, or in terms of polynomial or rational parametrizations. The parameter spaces of 
such models are typically semi-algebraic subsets of the parameter space of a reference model 
with nice properties, such as for example a regular exponential family. This observation 
leads to the definition of an 'algebraic exponential family'. This new definition provides a 
unified framework for the study of statistical models with algebraic structure. In this paper 
we review the ingredients to this definition and illustrate in examples how computational 
algebraic geometry can be used to solve problems arising in statistical inference in algebraic 
models. 
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1. Introduction 



Algebra has seen many applications in statistics (e.g. iDiaconisl . Il988l : 



Viana and Richards 



2001 



but it is only rather recently that computational algebraic geometry and related 
techniques in commutative algebra and combinatorics have been used to study statistical 
models and inference problems. This use of computational algebraic geometry was initi- 
ated in work on exact tests of conditional independence hypotheses in contingency tables 



( Diaconis and Sturmfels 



monograph by 



1998). Another line of work in experimental design led to the 



Pistoneetal 



(|200ll ). 'Algebra ic statistics', the buzz word i n the titles of 
this monograph and the more recent book bv lPachter and Sturmfels! (|2005l ). has now be- 
come the umbrella term for statistical research involving algebraic geometry. There has 
also begun to be a sense of community among researchers working in algebraic statistics 
as reflected by workshops, conferences, and summer schools. One such workshop, the 2005 
Workshop on Algebraic Statistics and Computational Biology held at the Clay Mathe- 
matics Institute led to the Statistica Sinica theme topic, of which this article forms a 
part. Other recent work in algebraic statistics has considered contingency table analysis 
(jAoki and Takemura^ 



2005 



genet ic tree models (jAllman and Rhodei 



Dobra and Sullivant 

X 



2003 



2004 



Takemura and Aoki. 



Eriksson et al 



2005; 



2005 ), maximum lik elihood estimation u nder multinomial s ampli ng (ICatanese et al 



2005 ), phylo- 



Sturmfels and Sullivant, 



200' 



Ho§ten et al.1 . 120051 ). reliability theory (jGiglio and Wvnnl . 12004 ) . and Bayesian networks 
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(jGarcia et al.1 . 120051 ). A special issue of the Journal of Symbolic Computation emphasiz- 
ing the algebraic side emerged following the 2003 Workshop on Computational Algebraic 
Statistics at the American Institute of Mathematics. 

The algebraic problems studied in algebraic statistics are of a rather diverse nature. At 
the very core of the field, however, lies the notion of an algebraic statistical model. While 
this notion has the potential of serving as a unifying theme for algebraic statistics, there 
does not seem, at present, to exist a unified definition of an algebraic statistical model. 
This lack of unity is apparent even when reading articles by the same authors, where 
two papers might use two different, non-equivalent definitions of an algebraic statistical 
model, for different theoretical reasons. The usual set-up for discussing algebraic statistical 
models has involved first restricting to discrete random variables and then considering 
models that are either conditional independence models or defined parametrically with a 
polynomial or rational parametrization. However, many statistical models for continuous 
random variables also have an algebraic flavor, though currently there has been no posited 
description of a general class of algebraic statistical models that would include models for 
continuous random variables. 

The main goal of this paper is to give a unifying definition of algebraic statistical 
models, as well as illustrate the usefulness of the definition in examples. Our approach 
is based on the following philosophy. Let V = (Pq \ 6 G O) be a statistical model with 
parameter space SCK 1 . In this paper, a model such as V is defined to be a family of 
probability distribut ions on some give n sample space. (For a discussion of the notion of a 
statistical model see iMcCullaghl (|2002l ) who proposes to refine the traditional definition to 
one that ensures that the model extends in a meaningful way under natural extensions of 
the sample space.) Suppose that in model V a statistical inference procedure of interest 
is well-behaved. If this is the case, then the properties of the inference procedure in a 
submodel Vm = (Pe I & £ M) are often determined by the geometry of the set M C O. 
Hence, if the set M exhibits algebraic structure, then the inference procedure can be studied 
using tools from algebraic geometry. This philosophy suggests the following definition. The 
semi-algebraic sets appearing in the definition will be defined in Section 3. 

Definition 1. Let V = (Pq \ 9 € O) be a "well-behaved" statistical model whose parameter 
space C M fe has non-empty interior. A submodel Vm = {Pe I Q £ M) is an algebraic 
statistical model if there exists a semi- algebraic set ACK* such that M = A n 0. 

Definition Q] is intentionally vague and the precise meaning of the adjective "well- 
behaved" depends on the context. For example, if asymptotic properties of maximum 
likelihood estimators are of interest then the word "well-behaved" could refer to mod- 
els satisfying regularity conditions guaranteeing that maximum likelihood estimators are 
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asymptotically normally distributed. However, one class of statistical models, namely reg- 
ular exponential families, can be considered to be well-behaved with respect to nearly any 
statistical feature of interest. 

Definition 2. Let (P v \ rj £ N) be a regular exponential family of order k. The subfamily 
induced by the set M C N is an algebraic exponential family if there exists an open set 
N C M. k , a diffeomorphism g : N — > N, and a semi- algebraic set A C M. k such that 
M = g~ 1 (Ar\N). 

Definition [2] allows one to consider algebraic structure arising after the regular expo- 
nential family is reparametrized using the diffeomorphism g (see Section 2.2 for a definition 
of diffeomorphisms). Frequently, we will make use of the mean parametrization. Algebraic 
exponential families appear to include all the existing competing definitions of algebraic 
statistical models as special cases. Among the examples c overed by Definition [2] are the 



parametric models for discrete random variables studied bv lPachter and Sturmfelsl ((2005) 
in the context of computational biology. Other models included in the framework are 
conditional independence models with or without hidden variables for discrete or jointly 
Gaussian random variables. Note that some work in algebraic statistics has f ocused on dis- 



crete distributions corresponding to the boundary of the probability simplex (jGeiger et al 



20061 ). These distributions can be included in an extension of the regular exponenti a 



ily corresp onding to the interior of the probability simplex ; see Barndorff-Nielsen 



pp. 154ff). lBrownl (|1986l . pp. 191ff), and lCsiszar and Matusl (|2005r ). Models given by semi- 



fam- 



(1978 



algebraic subsets of the (closed) probability simplex can thus be termed 'extended algebraic 
exponential families'. 

In the remainder of the paper we will explain and exemplify our definition of algebraic 
exponential families. We begin in Section 2 by reviewing regular exponential families and 
in Example [9] we stress the fact that submodels of regular exponential families are only 
well-behaved if the local geometry of their parameter spaces is sufficiently regular. In 
Section 3, we review some basic terminology and results on semi-algebraic sets, which do 
have nice local geometric properties, and introduce our algebraic exponential families. We 
also show that other natural formulations of an algebraic statistical model in the discrete 
case fall under this description and illustrate the generality using jointly normal random 
variables. We then illustrate how problems arising in statistical inference in algebraic mod- 
els can be addressed using computational algebraic geometry. Concretely, we discuss in 
Section 4 how so-called model invariants reveal aspects of the geometry of an algebraic 
statistical model that are connected to properties of statistical inference procedures such 
as likelihood ratio tests. As a second problem of a somewhat different flavour we show in 
Section 5 how systems of polynomial equations arising from likelihood equations can be 
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solved algebraically. 



2. Regular exponential families 

Consider a sample space X with a-algebra A on which is defined a u-finite measure v. 
Let T : X M fc be a statistic, i.e., a measurable map. Define the natural parameter space 

N=^rieR k : y e" tT(a; W(x) < ooj . 
For r] £ N, we can define a probability density on as 

where 

0(r7) = log / e" tT ^^(x) 
Jx 

is the logarithm of the Laplace transform of the measure v T = v o T _1 that the statistic 
T induces on the Borel cr-algebra of The support of v T is the intersection of all closed 
sets ACM* that satisfy is T (R k \ A) = 0. Recall that the affine dimension ofiCK fc is the 
dimension of the linear space spanned by all differences x — y of two vectors x, y £ A. 

Definition 3. Let be the probability measure on (X, A) that has v - density p„. The 
probability distributions (P v \ 7] £ N) form a regular exponential family of order k if N is 
an open set in R fc and the affine dimension of the support of v T is equal to k. The statistic 
T(x) that induces the regular exponential family is called a canonical sufficient statistic. 

The order of a regular exponential family is unique and if the same family is represented 
using two different canonical suf ficient statist ics then those two statistics are non-singular 



affine transforms of each other (jBrownl . I1986I . Thm. 1.9) 



2.1. Examples 

Regular exponential families comprise families of discrete distributions, which were the 
subject of much of the work on algebraic statistics. 

Example 4 (Discrete data). Let the sample space X be the set of integers {1,. . . ,m}. 
Let v be the counting measure on X, i.e., the measure v{A) of A C X is equal to the 
cardinality of A. Consider the statistic T : X — > M m_1 , 

T 0) = (l{i}{x),...,I {m - 1} (x)) , 

whose zero-one components indicate which value in X the argument x is equal to. In 
particular, when x = m, T(x) is the zero vector. The induced measure v T is a measure on 
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the Borel cr-algebra of M m_1 with support equal to the m vectors in {0, l}" 1 ^ 1 that have 
at most one non-zero component. The differences of these m vectors include all canonical 
basis vectors of M m_1 . Hence, the afhne dimension of the support of v T is equal to m — 1. 
It holds for all 77 G M™" 1 that 




4>{rj) = log 1 + y e Vx < 00 



Hence, the natural parameter space TV" is equal to all of IR m_1 and in particular is open. 
The ^-density is a probability vector in R m . The components Pn{x) for 1 < x < m — 1 
are positive and given by 

p v (x) 



1 + Y7x=i * 

The last component of p v is also positive and equals 

m— 1 

p,(m) = 1- ^Pr,{ 



[X 



1 4. r™- 1 e r) x ' 

The family of induced probability distribution (P„ | G M m_1 ) is a regular exponential 
family of order m — 1. The interpretation of the natural parameters % is one of log 
odds because is equal to a given positive probability vector (pi, ■ ■ ■ ,p m ) if and only 
if rj x = \og(p x /p m ) for x = 1, ... ,m — 1. This establishes a correspondence between the 
natural parameter space N = M m_1 and the interior of the m — 1 dimensional probability 
simplex. □ 

The other distributional framework that has seen application of algebraic geometry is 
that of multivariate normal distributions. 

Example 5 (Normal distribution). Let the sample space X be Euclidean space W equipped 
with its Borel cr-algebra and Lebesgue measure v. Consider the statistic T : X —>■ 
W x Rp(p+ 1 )/ 2 given by 

T(x) — (x±, • • • , Xp, . . . , Xp/2,, x\X2i • • • , Xp—\Xp) . 

The polynomial functions that form the components of T[x) are linearly independent and 
thus the support of u T has the full affine dimension p + p(p + l)/2. 

If 77 G MP x Rp(p +1 )/ 2 , then write rj^j G M p for the vector of the first p components r/j, 
1 < i < p. Similarly, write ij[pxp] for the symmetric p x p-matrix formed from the last 
p{jp + l)/2 components rjij, 1 < i < j < p. The function x 1— > e^ T ^ is zy-integrable if and 
only if r;[p X p] is positive definite. Hence, the natural parameter space ./V is equal to the 
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Cartesian product of M. p and the cone of positive definite p x p-matrices. If r\ is in the open 
set N, then 

= - j ( lo g det (%xp]) - ??f P ]??bxp]f?[p] -plog(2vr)) . 
The Lebesgue densities can be written as 

p v (x) = - 1 exp \r)Lx - trace(r^ x px*)/2 - 4 p^xp^h,]/^} ■ 

Setting S = r/^ p] and /x = J/^ p ]%], we find that 

p„(x) = — exp { — i(x — it)*E _1 (a; — //)) 

m ' v / (2vr)Pdet(S) 1 2V W V W/ 

is the density of the multivariate normal distribution JVp(//, £). Hence, the family of all 
multivariate normal distributions on W with positive definite covariance matrix is a regular 
exponential family of order p + p(p + l)/2. □ 

The structure of a regular exponential family remains essentially unchanged when sam- 
pling independent and identically distributed observations. 

Example 6 (Samples). A sample Xi,...,X n from P n comprises independent random 
vectors, all distributed according to P v . Denote their joint distribution by ®™ =1 P V . An 
important property of a regular exponential family (P v \ n £ N) of order k is that the 
induced family (f&^-i-Pjj | t) £ N) is again a regular exponential family of order k with 
canonical sufficient statistic Y17=i^( x i) and Laplace transform ncf>(r]). For discrete data 
as discussed in Example HI the canonical sufficient statistic is given by the vector of counts 

n 

N x = ^2l {x} (xi), x = l,...,m-l. 

i=i 

For the normal distribution in Example El the canonical sufficient statistic is in correspon- 
dence with the empirical mean vector X and the empirical covariance matrix S; compare 

TO- □ 



2.2. Likelihood inference in regular exponential families 

Among the nice properties of regular exponential families is their behavior in likelihood 
inference. Suppose the random vector X is distributed according to some unknown distri- 
bution from a regular exponential family (P„ | n £ N) of order k with canonical sufficient 
statistic T. Given an observation X = x, the log-likelihood function takes the form 

£(<q | T(x)) = rfT(x) - <j>{rj). 
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The log-Laplace transform <fi is a strictly con vex and smoo th, that is, infinitely many times 



differ entiable, function on the convex set N (jBrownl . Il986l . Thm. 1.13, Thm. 2.2, Cor. 2.3). 



The derivatives of 4> yield the moments of the canonical sufficient statistic such as the 
expectation and covariance matrix, 

c[r]):= A_ m=Ev[T{x)l (2 .i) 

Efa) := ^<t>(n) = E V {[T(X) - CO?)] [T(X) - Cfa)]*} . 

The matrix E(ry) is positive definite since the components of T{X) may not exhibit a linear 
relationship that holds almost everywhere. 

The strict convexity of <f> implies strict concavity of the log-likelihood function I. Hence, 
if the maximum likelihood estimator (MLE) 

f](T(x)) = argmax£(r/ | T(x)) 

exists then it is the unique local and global maximizer of I and can be obtained as the 
unique solution of the likelihood equations ((rj) = T(x). The existence of t)(T{x)) is 
equivalent to the condition T(x) £ ((N)\ the o pen set C(-^0 is equal to the interior of the 



convex hull of the support of v T ( Brownl . Il986l . Thm. 5.5). 

If X\ , . . . , X n are a sample of random vectors drawn from P v , then the previous dis- 
cussion applies to the family (®f = iP v \ r\ G N). In particular, the likelihood equations 
become 

i=i n i=i 

By the strong law of large numbers, T converges almost surely to the true parameter point 
CM e C(N). It follows that the probability of existence of the MLE, Prob % (T G ((N)), 
tends to one as the sample size n tends to infinity. Moreover, the mean parametrization 
map Tj i — ^ C( r ?) is a bijection from N to C(N) that has a differentiable inverse with total 
derivative 

an 

which implies in conjunction with an application of the central limit theorem: 
Proposition 7. The MLE fj(T) = C _1 (^) 

in a regular exponential family is asymptotically 
normal in the sense that if tjq is the true parameter, then 

V^K(T)-t»] =^^(0, EC^)- 1 ). 

A submodel of a regular exponential family (P^ \ n £ N) of order k is given by a 
subset M C N. If the geometry of the set M is regular enough, then the submodel may 
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inherit the favorable properties of likelihood inference from its reference model, the regular 
exponential family. The nicest possible case occurs when the submodel (P„ | r/ E M) 
has parameter space M = N n L, where L C W k is an affine subspace of M k . Altering 
the canonical sufficient statistics one finds that {P^ \ rj G M) forms a regular exponential 
family of order dim(L). 

Given a single observation X from P^, the likelihood ratio test for testing Hq : rj G M 
versus H± : r] £ N \ M rejects Hq for large values of the likelihood ratio statistic 

X M (T(X)) = sup l(r] | T{X)) - sup £(r] | T(X)). 

If we observe a sample X\, . . . , X n from P^, then the likelihood ratio statistic depends on 
T only and is equal to uXm(T). For a rejection decision, the distribution of u\m{T) can 
often be approximated using the next asymptotic result. 

Proposition 8. If M = N n L for an affine space L and the true parameter ijq is in M , 
then the likelihood ratio statistic u\m{T) converges to xi-dim(L)' ^ e chi-square distribution 
with k — dim(L) degrees of freedom, as n — > oo. 

In order to obtain asymptotic results such as uniformly valid chi-square asymptotics for 
the likelihood ratio statistic, the set M need not be given by an affine subspace. In fact, if 
M is an m-dimensional smooth manifold in then u\m{T) still converges in distribution 
to Xfc-m f° r an y Vo G M. A set M is an m-dimensional smooth manifold if for all 770 £ M 
there exists an open set U C R fc containing 770, an open set V C ]R fe , and a diffeomorphism 
5 : V -» J7 such that g(Vn (M m x {0})) = 17. Here, R m x {0} C R k is the subset of vectors 
for which the last k — m components are equal to zero. A diffeomorphism g : V — ► U is 
a smooth bijective map that has a smooth inverse g~ 1 : U — > V. An exponential family 
induced by a smooth m anifold in the natural parameter space is commonly termed a curved 



exponential family; see iKass and Vod (| 19971 ) for an introduction to this topic. 

The fact that many interesting statistical models, in particular models involving hidden 
variables, are not curved expone ntial families c alls f or generalization. One attempt at 



such generalization was made by iGeiger et al.l (|200ll ) who introduce so-called stratified 



exponential families. A stratified exponential family is obtained by piecing together several 
curved exponential families. However, as the next example shows, stratified exponential 
families appear to be a bit too general as a framework unless more conditions are imposed 
on how the curved exponential families are joined together. Example [9] is inspired by an 



example in iRockafellar and Wetsl (jl998l . p. 199) 



Example 9. Consider the regular exponential family V of bivariate normal distributions 
with unknown mean vector \i = (^i,/^)* £ M 2 but covariance matrix S equal to the 
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identity matrix I2 £ M 2x2 . The natural parameter space of this model is the plane M 2 . 
When drawing a sample Xi, . . . ,X n from a distribution in V, the canonical statistic is 
the sum of the random vectors. Dividing by the sample size n yields the sample mean 
vector X E M 2 , which is also the MLE of /i. In the following we will assume that the 
true parameter jj,Q is equal to the origin. Then the rescaled sample mean vector y/nX is 
distributed according to the bivariate standard normal distribution A/2(0, 12). 

If we define a submodel Vc Q V by restricting the mean vector to lie in a closed set 
C C M 2 , then the MLE fx for the model Vc is the point in C that is closest to X in 
Euclidean distance. For n = 1, the likelihood ratio statistic Ac(X) for testing fj, E C 
versus /i C is equal to the squared Euclidean distance between X and C. Hence, the 
likelihood ratio statistic based on an n-sample is 

nXc(X) = n ■ min \\X — fi\\ 2 = min ||\/nX — fJ.\\ 2 , 

i.e., the squared Euclidean distance between the standard normal random vector \pnX 
and the rescaled set y/nC. 

As a concrete choice of a submodel, consider the set 

Ci = {(Mi, Ma)* e R 2 I M2 = Mi sin(l//n), Mi / 0} U {(0, 0)*}. 

This set is the disjoint union of the two one-dimensional smooth manifolds obtained by 
taking fi\ < and Ml > 0, and the zero-dimen sional smooth mani fold given by the origin. 



These manifolds form a stratification of C\ (jGeiger et al.l . |2001| . p. 513), and thus the 
model Vc\ constitutes a stratified exponential family. In Figure [U we plot three of the 
sets y/nCi for the choices n = 100, 100 2 , 100 3 . The range of the plot is restricted to 
the square [— 3,3] 2 , which contains the majority of the mass of the bivariate standard 
normal distribution. The figure illustrates the fact that as n tends to infinity the sets 
\JuC\ fill more and more densely the 2-dimensional cone comprised between the axes 
Hi = ±Mi- Hence, nXd(X) converges in distribution to the squared Euclidean distance 
between a bivariate standard normal point and this cone. So although we pieced together 
smooth manifolds of codimension 1 or larger, the limiting distribution of the likelihood 
ratio statistic is obtained from a distance to a full-dimensional cone. 
As a second submodel consider the one induced by the set 

C 2 = {(Mi,M2)' G M 2 I M2 = Mi sin(- log(|Mi|/4)), Mi G [-3, 3] \ {0}} U {(0, 0)'}. 

The model Vc 2 is again a stratified exponential family. However, now the sets \JnC2 have 
a wave-like structure even for large sample sizes n; compare Figure El We conclude that in 
this example the likelihood ratio test statistic n\c 2 (X) does not converge in distribution 
as n tends to infinity. □ 
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Figure 1: Sets y/nC\ for n = 100, 100 2 , 100 3 . 




Figure 2: Sets ^C 2 for n = 100, 100 2 , 100 3 . 

The failure in the previous example of nice asymptotic behavior of the likelihood ratio 
test is part of our motivation for restricting to the class of algebraic exponential families, 
which we introduce next. 

3. Algebraic exponential families 

In the following definition, which was anticipated in the introduction, we propose the 
use of semi-algebraic sets to unify different definitions of algebraic statistical models. Us- 
ing semi-algebraic sets eliminates phenomena as created in Example [9] because these sets 
have nice local geometric properties. In addition, imposing algebraic structure allows one 
to employ the tools of computational algebraic geometry to address questions arising in 
statistical inference. (More details on both these points are given in Section 4.) 

Definition 2. Let (P v \ r/ £ N) be a regular exponential family of order k. The subfamily 
induced by the set M C N is an algebraic exponential family if there exists an open set 
iV C l', a diffeomorphism g : N — > N, and a semi- algebraic set A C R fc such that 
M = g-^AHN). 

The definition states that an algebraic exponential family is given by a semi-algebraic 
subset of the parameter space of a regular exponential family. However, this parameter 
space may be obtained by a reparametrization g of the natural parameter space N, which 
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provides the necessary flexibility to capture the algebraic structure found in interesting 
statistical models including ones that do not form curved exponential families. The mean 
parametrization £(77) is one example of a useful reparametrization. 

Before giving examples of algebraic exponential families we provide some background on 
semi-a lge braic sets; more i n dep th introductions can be found, for example, in 
([l99oh or lBochnak et al.1 (|l99ah . 



Benedetti and Risler 



3.1. Basic facts about semi-algebraic sets 

A monomial in indeterminates (polynomial variables) t%, ... ,t n , is a formal expression 
of the form t@ = tf 1 ^ 2 ■ ■ ■ tn n where = (0i, . . . , n ) is the non- negative integer vector of 
exponents. A polynomial 

/ = £<*f 

is a linear combination of monomials where the coefficients cp are in a fixed field K. and 
B C N n is a finite set of exponent vectors. The collection of all polynomials in the 
indeterminates t\, . . . ,t n with coefficients in a fixed field IK is the set K[t] = . . . , t n ]. 
The collection of polynomials K[t] has the algebraic structure of a ring. Each polynomial in 
K[t] is a formal linear combination of monomials that can also be considered as a function 
/ : K n — > IK, defined by evaluation. Throughout the paper, we will focus attention on the 
ring R[t] of polynomials with real coefficients. 

Definition 10. A basic semi-algebraic set is a subset of points in W 1 of the form 

A = {9 g R n I f(9) >0V/gF, h(9) = V/i e H} 

where F C K[t] is a finite (possibly empty) collection of polynomials and H C R[t] is an 
arbitrary (possibly empty) collection of polynomials. A semi-algebraic set is a finite union 
of basic semi- algebraic sets. If F = then A is called a real algebraic variety. 

A particular special case of a general semi-algebraic set occurs when we consider sets 
of the form 

A = {6 G R n I f(9) > V/ G F, g(6) > V 5 G G, h(0) = V7t G H} 
where both F and G are finite collections of real polynomials. 

Example 11. The open probability simplex for discrete random variables is a basic semi- 
algebraic set, where F = {ti \ i = 1, . . . , n— 1}U {1— Y27=i **} an< ^ H = %. More generally, 
the relative interior of any convex polyhedron in any dimension is a basic semi-algebraic 
set, while the whole polyhedron is an ordinary semi-algebraic set. □ 
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Example 12. The set S C ]R mxm of positive definite matrices is a basic semi-algebraic 
set, where F consists of all principal sub determinants of a symmetric matrix and G is 
the empty set. □ 

In our introduction, parametrically specified statistical models were claimed to be alge- 
braic statistic al models. This non-t rivial claim holds due to the famous Tarski-Seidenberg 
theorem (e.g. iBochnak et all Il998l ). which says that the image of a semi-algebraic set un- 



der any nice enough mapping is again a semi- algebraic set. To make this precise we need 
to define the class of mappings of interest. 

Let Vi = /l/ffi, • ■ ■ ,ipn = fn/g n be rational functions where fo,gi £ R[t] = R[t 1: ...,t d ] 
are real polynomial functions. These rational functions can be used to define a rational 
map 



if): 



-> Oi(a),...,^n(a)), 
{acM d :rU(a)/0}. 



which is well-defined on the open set 

Theorem 13 (Tarski-Seidenberg). Let A C M. d be a semi- algebraic set and if) a rational 
map that is well-defined on A, that is, A C D^. Then the image if) (A) is a semi-algebraic 
set. 



Pachter and Sturmfelsl (|2005l ) define an algebraic statistical model as the image of a 



polynomial parametrization if) {A) C A where A is the interior of a polyhedron and A is 
the probability simplex. The emphasis on such models, which one might call parametric 
algebraic statistical models, results from the fact that most models used in the biological 
applications under consideration (sequence alignment and phylogenetic tree reconstruction, 
to name two) are parametric models for discrete random variables. Furthermore, the precise 
algebraic form of these parametric models is essential to parametric maximum a po steriori 
estimation, one of the major themes in the text of iPachter and Sturmfelsl (120051 ). The 
Tarski-Seidenberg theorem and Example 0] yield the following unifying fact. 

Corollary 14. If a parametric statistical model for discrete random variables is a well- 
defined image of a rational map from a semi- algebraic set to the probability simplex, then 
the model is an algebraic exponential family. 



3.2. Independence models as examples 

Many statistical models are defined based on considerations of (conditional) indepen- 
dence. Examples include Markov chain models, model s for testing indep endence hypotheses 
in contingency tables and graphical models, see e.g. iLauritzenl (| 19961 ) . As we show next, 
conditional independence yields algebraic exponential families in both the Gaussian and 
discrete cases. The algebraic structure also passes through under marginalization, as we 
will illustrate in Section 4. 
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Example 15 (Conditional independence in normal distributions). Let X = (Xi, . . . ,X p ) 
be a random vector with joint normal distribution J\f p (n, S) with mean vector \i £ M p and 
positive definite covariance matrix S. For three pairwise disjoint index sets A,B,C C 
{1, . . . ,p}, the subvectors Xa and Xb are conditionally independent given Xc, in symbols 
Xa -U- Xb I A^c if and only if 

det(£ WuCx{j}uC ) = Mi £ A, j £ B. 

If C = 0, then conditional independence given A0 is understood to mean marginal inde- 
pendence of Xa and Xb- □ 

Example 16 (Conditional independence in the discrete case). Conditional independence 
statements also have a natural algebraic interpretation in the discrete case. As the simplest 
example, consider the conditional independence statement X\ _LL X2 \ A3 for the discrete 
random vector (Xi, X2, A3). This translates into the collection of algebraic constraints on 
the joint probability distribution 

Prob(X! =h,X 2 = ji, A 3 = k) • Pvoh(X 1 =i 2 ,X 2 = j 2 ,X 3 = k) 
= Pvoh(X 1 =h,X 2 = j 2 ,X 3 = k) • Prob(*i =i 2 ,X 2 = ji , X 3 = k) 

for all i±,i 2 G [mj, ji,j 2 £ [m 2 \ and k £ [m 3 ], where [m] = {1,2,... ,m}. Alternatively, 
we might write this in a more compact algebraic way as: 

PiijlkPi 2 j2k ~ Phj2kPt2jik = 0, 

where pijk is shorthand for Prob(Xi = i,X 2 = j,X 3 = k). In general, any collection 
of conditional independence statements for discrete random variables corresponds to a 
collection of quadratic polynomial constraints on the components of the joint probability 
vector. □ 



4. Model geometry 

Of fundamental importance to statistical inference is the intuitive notion of the "shape" 
of a statistical model, reflected in its abstract geometrical properties. Examples of interest- 
ing geometrical features are whether or not the likelihood function is multimodal, whether 
or not the model has singularities (is non-regular) and the nature of the underlying sin- 
gularities. These are all part of answering the question: How does the geometry of the 
model reflect its statistical features? When the model is an algebraic exponential family, 
these problems can be addressed using algebraic techniques, in particular by computing 
with ideals. This is even true when the model comes in a parametric form, however, it is 
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then often helpful to translate to an implicit representation of the model. 

4.1 Model invariants 

Recall that an ideal I C M[t] is a collection of polynomials such that for all f,g G I, 
f + g G I and for all / G I and h € M.[t], h • f € I. Ideals can be used to determine real 
algebraic varieties by computing the zero set of the ideal: 

V(I) = {a G R n | /(a) = for all / G 1} . 

When we wish to speak of the variety over the complex numbers we use the notation Vc(I). 
Reversing this procedure, if we are given a set V C M. n we can compute its defining ideal, 
which is the set of all polynomials that vanish on V: 

I(V) = {/ G K[t] | /(a) = for all a G V} . 

Definition 17. Let A be a semi- algebraic set defining an algebraic exponential family 
~Pm = (Pn I V ^ M) via M = g~ l (A n g(N)). A polynomial f in the vanishing ideal 1(A) 
is a model invariant for Vm- 

Remark 18. The term "model invariant" is chosen in analogy to the term "phylogenetic 
invariant" that was coined by biologists working with statistical models that are useful for 
the reconstruction of phylogenetic trees. 

Given a list of polynomial fi, ■ ■ ■ , fk the ideal generated by these polynomials is denoted 
(h,...,fk) = hi- fi\ hi G K[t]| . 

The Hilbert basis theorem says that every ideal in a polynomial ring has a finite generating 
set. Thus, when working with a statistical model that we want to describe algebraically, 
we need to compute a finite list of polynomials that generate the ideal of model invari- 
ants. These equations can be used to address questions like determining the structure of 
singularities which in turn can be used to address asymptotic questions. 

Example 19 (Conditional independence). In Example 1 151 we gave a set of equations whose 
zero set in the cone of positive definite matrices is the independence model obtained from 
Xa -LL Xb I Xq- However, there are more equations, in general, that belong to the ideal 
of model invariants I. In particular, we have 

I = (det t | S is a (|C| + 1) x (\C\ + 1) submatrix of Sauc.buc) • 

The fact that this ideal vanishes on the model follows from the fact that any E in the 
model is positive definite and, hence, each principal minor is invertible. The fact that the 



14 



indicate d ideal c ompr ises all model invariants can be derived from a result in commutative 
algebra (|Concal . 1 1994 ) . 

In the discrete case, the polynomials we introduced in Example [16] generate the ideal 
of model invariants for the model induced by X\ JL Xi \ X3. For models induced by 
collections of indep endence statements this need no longer be true; compare Theorem 8 in 
(|2005l l. □ 



Garcia et al 



One may wonder what the use of passing from the set of polynomials exhibited in 
Example [TBI to the considerably larger set of polynomials described in Example 1191 is. since 
both sets of polynomials define the model inside the cone of positive definite matrices. The 
smaller set of polynomials have the property that there lie singular covariance matrices in 
the positive semidefinite cone that satisfy the polynomial constraints but are not limits 
of covariance matrices in the model. From an algebraic standpoint, the main problem is 
that the ideal generated by the smaller set of polynomials is not a prime ideal. In general, 
we prefer to work with the prime ideal given by all model invariants because prime ideals 
tend to be better behaved from a computational standpoint and are less likely to introduce 
extraneous solutions on boundaries. 

For the conditional independence models described thus far, the equations 1(A) that 
define the model come from the definition of the model. For instance, conditional indepen- 
dence imposes natural constraints on covariance matrices of normal random variables and 
the joint probability distributions of discrete random variables. When we are presented 
with a parametric model, however, it is in general a challenging problem of computational 
algebra to compute the implicit description of the model A as a semi-algebraic set. At 
the heart of this problem is the computation of the ideal of model invariants 1(A), which 
can be solved using Grobner bases. Meth ods for computin g an implicit description from 
a parametric description can be found in ICox et al.1 (|1997l ) , though the quest for better 



implicitization methods is an active area of research. 

The vanishing ideal of a semi-algebraic set can be used to address many questions 
about it, for instance, the dimension of a semi- algebraic set. The following definition and 
proposition provide a useful characterization of the dimension of a semi-algebraic set. 

Definition 20. A set of indeterminates Pi±, ■ ■ ■ ,Pi k is algebraically independent for the 
ideal I if there is no polynomial only in pi 1: . . . ,pi k that belongs to I. 

Proposition 21. The dimension of a semi- algebraic set A is the cardinality of the largest 
set of algebraically independent indeterminates for 1(A). 

The proof that algebraically independent sets of indeterminates and Propositio n [2T1 
meshes with the usual geometric notion of dimension can be found in 



Cox et al 



([19971 ) . 
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The subset V^ng C V where a variety V is singular is also a variety. Indeed, suppose 
that polynomials fi, ■ ■ ■ , fk generate the vanishing ideal I(V). Let J £ R[x] fcxn denote the 
Jacobian matrix with entry Jjj = t^-. 

Proposition 22. A point a £ Vc(/) is a singular point of the complex variety if and 
only if J(a) has rank less than the codimension of the largest irreducible component of V 
containing a. 

The singularities of the real variety are defined to be the intersection of the singular 
locus of Vc(I) with R n . Proposition 1221 yields a direct way to compute, as an algebraic 
variety, the singular locus of V. Indeed, the rank of the Jacobian matrix is less than c, if 
and only if the c x c minors of J are all zero. Thus, if / defines an irreducible variety of 
codimension c, the ideal (M C (J), /i, . . . , fj.) has as zero set the singular locus of V, where 
M C (J) denotes the set of c x c minors of J. If the variety is not irreducible, the singular 
set consists of the union of the singular set of all the irreducible components together with 
the sets of all pairwise intersections between irreducible components. 

Removing the singularities V s i ng from V one obtains a smooth manifold such that the 
local geometry at a non-singular point of V is determined by a linear space, namely, 
the tangent space. At singular points, the local geometry can be described using the 
tangent cone, which is the semi-algebraic set that approximates the limiting behavior of 
the secant lines that pass through the point of interest. In the context of parameter 
spaces of statistical models, the study of this limiting behavior is crucial for the study 
of large sample asymptotics at a singular point. The geometry of the tangent cone for 
semi-algebraic sets can be complicated and we postpone an in-depth study for a later 
publication. For the singular models that we encounter in the next section, the crucial 
point on the tangent cone is the following proposition. 

Proposition 23. Suppose that A = V\ U ■ • • U V m is the union of smooth algebraic varieties 
and let a be a point in the intersection V% D • • • fl Vj such that a^Vj. for k > j + 1. Then 
the tangent cone of A at a. is the union of the tangent planes to Vi, . . . , Vj at a. 

4.2 A conditional independence model with singularities 

Let X = (Xi, X2,X%) have a trivariate normal distribution Nz(ijl, £), and define a 
model by requiring that X\ _LL X2 and simultaneously X\ _LL X2 \ A3 . By Example [I5j the 
model is an algebraic exponential family given by the subset M = C _1 (A n C{N)), where 
C(A) = R 3 x Rpj 3 is the Gaussian mean parameter space and the algebraic variety 

A = {(ji, S) g R 3 x M s 3 y x ^ I <ri2 = 0, det(S {li3}x{2i3} ) = - a 13 a 23 = 0} . 

Here, Msym i s the space of symmetric 3 x 3-matrices. The set A is defined equivalently by 
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the joint vanishing of u\i and (713(723. Hence, A = A\ 3 U ^23 f° r 

A Vi = {(/t,£) E A I CT12 = CT13 = 0}, 
,423 = {(^ S) E A I CT12 = CT 23 = 0}. 

This decomposition as a union reflects the well-known fact that 



Xt JL X 2 A Xt _u_ x 2 1 X 3 ] 



X 1 AL(X 2 ,X 3 ) V X 2 JL(X 1 ,X 3 )], 



which holds fo r the multiva riate normal distribution but also when X 3 is a binary vari- 
able; compare (jDawidl . ll98Q . Thm. 8.3). By Proposition [23] the singular locus of A is the 
intersection 



ismg 



A13 n yl 2 3 = {(M, S) E A I (7i2 = (713 = 0-23 = 0}, 



which corresponds to diagonal covariance matrices S, or in other words, complete inde- 
pendence of the three random variables X\ JL X 2 JL X 3 . 

Given n independent and identically distributed normal random vectors X\, . . . ,X n E 
R 3 , define the empirical mean and covariance matrix as 



1 n 1 n 

x = -Tx t , s = -Y j (x l -x){x t -x) t 

i=l i=l 



(4.1) 



respectively. The likelihood ratio test statistic for testing the model based on parameter 
space M against the regular exponential family of all trivariate normal distributions can 
be expressed as 



X M (X,S)= log 



•S11S22 



S11S22 



'12 



+ min < log 



S33.2 
S33.12 



dog 



S33.1 
S33.12 



(4.2) 



where for A C {1,2}, s 33 a is the empirical conditional variance 

•S33.A = S33 - 5'{3}xa5'ax/1^x{3}- 

The three terms in (j4.2j) correspond to tests of the hypotheses 

X x JL X 2 , Xi JL X 3 I X 2 , and X 2 JL X 3 \ X x . 

Note that a joint distribution satisfies X\ JL {X 2 ,X 3 ) if and only if it satisfies both X\ JL 
X 2 and Xi JL X 3 | X 2 . 

If (//, E) is an element of the smooth manifold A \ A s i ng , then \m(X, S) converges to a 
X2 _ distribution as n tends to i nfinity ; but o ver the singular locus the limiting distribution 
is non-standard as detailed in iDrtonl (|2006l ) . 
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Proposition 24. Let (/i, E) £ ^4 s ing- ^4s h — > oo, i/ie likelihood ratio test statistic Xm(X, S) 
converges to the minimum of two dependent x 2 - distributed random variables, namely, 

\ M (X, S) — > d mm(W 12 + W 13 , W 12 + W 23 ) = W 12 + min(Wi3, W 23 ) 

for three independent ^-random variables W\ 2 , Wis and W23. 

Similar asymptotics arise in the model of joint marginal and conditional independence 
in the discrete case with X3 binary. In this case the variety breaks again into the union 
of two independence varieties X± 11_ {X 2 ,X% } and X 2 _LL {Xi,X$}, whose intersection 
is the complete independence variety corresponding to X\ _LL X 2 _LL X3. Non-standard 
asymptotics will occur at the intersection of these two varieties. However, as both of the 
varieties X± JL {X 2 ,Xs} and X 2 JL {Xi,X%} are smooth, the tangent cone is simply the 
union of the two tangent spaces to the two component varieties. The asymptotics behave 
in a manner similar to the Gaussian the minimum of chi-square distributions. 

4.3 Hidden random variables 

Another important use for the implicit equations defining a model are that they can be 
used to determine a (partial) description of any new models that arise from the given model 
via marginalization. In particular, algebraic methods can be used to explore properties of 
models with hidden random variables. In this section, we describe how to derive model 
invariants via elimination in the presence of hidden variables for Gaussian and discrete 
random variables. 

Proposition 25. Suppose that the random vector X = (X\, . . . ,X p ) is distributed accord- 
ing to a multivariate normal distribution from a model with ideal of model invariants I C 
M[fii,aij I 1 < i < j < p\. Then the elimination ideal InR[/Xj, cr,j | 1 < i < j < p— 1] com- 
prises the model invariants of the model created by marginalizing to X' = (X%, . . . ,X p -x). 



The indicated elimination can be computed using Grobner bases (I Cox et all 119971 ) . A 
similar type of elimination formulation can be given for the marginalization in the discrete 
case. 

Proposition 26. Let X±, . . . ,X P be discrete random variables with X^ taking values in 
[rrik] = {1, • • • , nrik}. Consider a model for the random vector (X\, . . . ,X p ) that has the 
ideal of model invariants I C M[pj 1) ... i j p ]. Let J C ^[qi 1 ,...,i p _ 1 ,Pi 1 ,... ) i p ] be the ideal 

I m v 

\ j= i 

Then the elimination ideal J D is the ideal of model invariants of the model 

created by marginalizing to X' = (X\, . . . , 
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Up to this point, we have made very little use of the inequality constraints that can arise 
in the definition of a semi-algebraic set. In both of our conditional independence models, 
the inequality constraints arose from the fact that we needed to generate a probability 
distribution, and were supplied by the positive definite cone or the probability simplex. 
In general, however, we may need non-trivial inequality constraints to describe the model. 
Currently, very little is known about the needed inequality constraints, even in simple 
examples. This occurs, for instance, in the marginalization of conditional independence 
models. 

Example 27 (Marginalization of an Independence Model). Let A be the semi-algebraic 
set of probability vectors for a discrete random vector X = (X\, X2, X3) satisfying the 
conditional independence constraint X\ JL X2 \ X3. Let if}(A) denote the image of this 
model after marginalizing out the random variable X3. 

The joint distribution of X\ and X2 can be represented as a matrix (pij)- Assuming 
as above that X^ takes on values in [rrik], the conditional independence constraint X± _L 
_L X2 | X3 implies that the matrix (pij) has rank less than or equal to 7713. The set of 
equality constraints that arise from this parametrization are the set of (7773 + 1) x (7773 + 1) 
minors of the matrix (pij). However, it is not true that these equality constraints together 
with the inequality constraints arising from the probability simplex suffice to define this 
model. The smallest example of this occurs when, mi = 771,2 = 4 and 7773 = 3. In this case 
the ideal I(ip(A)) is generated by the determinant of the generic 4x4 matrix (pij). Fix a 
small value of e > 0. The matrix 
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represents a probability distribution that satisfies the determinant constraint (the ma- 
trix has rank 3). However, it can be shown that this probability distribution does not 
belong to tp(A). That is, this bivariate distribution is not the marginalization of a trivari- 
ate distribution exhibiting conditional independence. Thus, in addition to the equality 
constraint, there are non-trivial inequality constraints that define the margin alized inde- 
pendence model. More about this example can be found in lMond et al.l (|2003l ). □ 



5. Solving likelihood equations 

Let V = (P v I r\ £ TV) be a regular exponential family with canonical sufficient statistic 
T. If we draw a sample X\, . . . ,X n of independent random vectors from P^, then, as 
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detailed in Section 2, the canonical statistic becomes Y^i=x ^(^) =: n ^ an d the log- 
likelihood function takes the form 

i( V | f)=n [rff-<j>{n)] (5.1) 

For maximum likelihood estimation in an algebraic exponential family Vm = [Pri \ V G M), 
M C iV, we need to maximize £(77 | T) over the set M. 

Let A and 5 be the semi-algebraic set and the diffeomorphism that define the parameter 
space M. Let 1(A) = {fi, ■ ■ ■ , f m ) be the ideal of model invariants and 7 = g(r]) the 
parameters after reparametrization based on g. If boundary issues are of no concern then 
the maximization problem can be relaxed to 

max £(7 IT) 

w 1 (5.2) 
subject to /i(7) = 0, i = 1, . . . , m, 

where 

£( 7 |f)= 5 - 1 ( 7 )*f-0(< 7 - 1 ( 7 )). (5.3) 

If £(7 I T) has rational partial derivatives then the maximization problem (|5.2p can be 
solved algebraically by solving a polynomial system of critical equat i ons. D etails on this 



appro ach in the case of discrete data can be found in ICatanese et al.l (|2006l ) ; IHosten et al 



( 20051 ). However, depending on the interplay of g~ l and the mean parametrization £, which 
according to (|2.ip is the gradient map of the log-Laplace transform cj), such an algebraic 
approach to maximum likelihood estimation is possible also in other algebraic exponential 
families. 

Proposition 28. The function l(j \ T) has rational partial derivatives if (i) the map 
( o g~ x is a rational map and (ii) the map g^ 1 has partial derivatives that are rational 
functions. 

Example 29 (Discrete likelihood equations). For the discrete exponential family from 
Example [H the mean parameters are the probabilities pi, ■ ■ ■ ,p m -i- The inverse of the 
mean parametrization map has component functions (C~ 1 )x = ^g(p x /p m ), where p m = 
1 — pi — • • • —p m _i. Since d\og(t)/dt = 1/t is rational, (~ x has rational partial derivatives. 
Hence, maximum likelihood estimates can be computed algebraically if the discrete alge- 
braic exponential family is defined in terms of the probability coordinate s pi , . . . , p m -i 



" ?his i s the context of the above mentioned work by ICatanese et al.l (|2006l ); IHosten et al 



(120051 b □ 



Example 30 (Factor analysis) . The mean parametrization ( for the family of multivariate 
normal distributions and its inverse £ -1 are based on matrix inversions and thus are rational 
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maps. Thus algebraic maximum likelihood estimation is possible whenever a Gaussian 
algebraic exponential family is defined in terms of coordinates g(rj) for a rational map g. 
This includes families defined in the mean parameters (/x, S) or the natural parameters 

As a concrete example, consider the factor analysis model with one factor and four 
observed variables. In centered form this model is the family of multivariate normal dis- 
tributions A/4(0, S) on M 4 with positive definite covariance matrix 



S = diag(w) + AA*, 



(5.4) 



where u G (0,oo) 4 and A G M 4 . Equation (|5.4j) involves polynomial expressions in 9 = 
(u, A). For algebraic maximum likelihood estimation, however, it is computationally more 
efficient to employ the fact that condition (|5.4p is equivalent to requiring that the positive 
definite natural parameter S^ 1 can be expressed as 



IT 1 ^) = diag(w) — AA*, 



(5.5) 



Drtonetal 



( 20071 . §8). When parametrizing £ 1 



with 9 = (cl>, A) G (0, oo) x M, compare 
the map g is the identity map. 

Let S be the empirical covariance matrix from a sample of random vectors X\ , . . . , X n 
in M 4 ; compare (|4.ip . We can solve the maximization problem (|5.2p by plugging the 
polynomial parametric expression for 7 = X -1 from (15. 5p into the Gaussian version of the 
log-likelihood function in (|5.3p . Taking partial derivatives we find the equations 



1 



ddetCE-^fl)) 
9ft 



trace 



S- 



d^- 1 (9) 
d9i 



1,...,8. 



(5.6) 



det(S- 1 (^)) 

These equations can be made polynomial by multiplying by det(E -1 (#)). Clearing the 
denominator introduces many additional solutions 9 G C 8 to the system, which lead to 
non-invertible matrices X _1 (#). However, these extraneous solutions can be removed using 
an operation called saturation. After saturation, the (complex) solution set of (|5.6p is seen 
to consist of 57 isolated points. These 57 solutions come in pairs 9± = (u, ±A); one solution 
has A = 0. 

When the empirical covariance matrix S is rounded then we can compute the 57 solu- 
tions using software for algebraic and numerical solving of polynomial equations. For the 
example 
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we find that (|5.6p has 11 feasible solutions in (0, oo) 4 x R 4 . Via (|5.5p . these solutions define 
6 distinct factor analysis covariance matrices. Two of these matrices yield local maxima of 
the likelihood function: 



/ 13 2.1242 0.9870 2.5876\ 

2.1242 11 0.89407 2.3440 

0.9870 0.8941 9 1.0891 

\2.5876 2.3440 1.0891 7 / 



/ 13 2.1816 1.0100 1.0962\ 

2.1816 11 2.3862 2.3779 

1.0100 2.3862 9 1.1990 

\1.0962 2.3779 1.1990 7 / 



The matrix to the left has the larger value of the likelihood function and we claim that it 
yields the global maximum. For this claim to be valid we have to check that no matrix 



close to the boundary of the set {£ 1 (6) | 9 G (0, oo) 



l } has larger value of the 



{S- 1 ^) | 9 G (0,oo) 4 > 
complex solutions 9 G" (0, oo) 



likelihood function. Suppose this was not true. Then the likelihood function would have 
to achieve its global maximum over the cone of positive definite matrices outside the set 

In order to rule out this possibility, we consider all the 
M 4 of (15. 6p that induce real and positive definite matrices 
T,^ 1 (9). There are ten such solutions, which all have uj G M 4 and purely imaginary A G iM 4 . 
There are five different induced matrices £ -1 (#), but at all of them the likelihood function 
is smaller than for the two quoted local maximizer. This confirms our claim. 
As a second interesting example consider 
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The equations (15.61) have again 11 feasible solutions 9. Associated are 6 distinct factor 
analysis covariance matrices that all correspond to saddle points of the likelihood function. 
Hence, if we close the set of inverse covariance matrices {X -1 (#) | 9 G (0,oo) 4 x M 4 }, then 
the global optimum of the likelihood function over this closure must be attained on the 
boundary. 

In order to determine which boundary solution provides the global maximum of the 
likelihood function, it is more convenient to switch back to the standard parameterization 
in (|5.4p . which writes the covariance matrix as for 9 = (lo,X) in (0, oo) 4 x M 4 . The 
closure of {£(#) | 9 G (0, oo) 4 x M 4 } is obtained by closing the parameter domain to 
[0,oo) 4 x M 4 . Since 62 is positive definite, the global maximizer of the likelihood function 
must be a matrix of full rank, which implies that at most one of the four parameters Ui 
can be zero. In each of the four possible classes of boundary cases the induced likelihood 
equations (in 7 parameters) have a closed form solution leading to a unique covariance 
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matrix. We find that the global maximum is achieved in the case u>i = 0. The global 
maximizer of the likelihood functions over the closure of the parameter space equals 
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In the factor analysis literature data leading to such boundary problems are known as 
Heywood cases. Hence, our computation proves that £2 constitutes a Heywood case. □ 



6. Conclusion 

In this paper, we have attempted to present a useful, unified definition of an alge- 
braic statistical model. In this definition, an algebraic model is a submodel of a reference 
model with nice statistical properties. Working primarily with small examples of con- 
ditional independence models, we have tried to illustrate how our definition might be a 
useful framework, in which the geometry of parameter spaces can be related to properties 
of statistical inference procedures. Since we impose algebraic structure, this geometry can 
be studied using algebraic techniques, which allow one to tackle problems where simple 
linear arguments will not work. In order to apply these algebraic techniques in a partic- 
ular example of interest, one can resort to one of the many software systems, both free 
and commercial, that provide implementations of algorithms for carrying out the neces- 
sary computations. A compr ehensive list of useful software can be found in Chapter 2 of 



Pachter and Sturmfeld (120051 ). 



While we believe that future work in algebraic statistics may involve reference models 
in which the notion of "nice statistical properties" is filled with life in many different ways, 
we also believe that the most important class of reference models are regular exponential 
families. This led us to consider what we termed algebraic exponential families. These 
families were shown to be flexible enough to encompass structures arising from marginal- 
ization, i.e., the involvement of hidden variables. Hidden variab l e mod els typically do not 
form curved exponential families, which triggered iGeiger et al.l (|200ll ) to introduce their 
stratified exponential families. These stratified families are more general than both alge- 
braic and curved exponential families but, as our Example [9] suggests, they seem in fact 
to be too general to allow the derivation of results that would hold in the entire class 
of models. In algebraic exponential families, on the other hand, the restriction to semi- 
algebraic sets entails that parameter spaces always have nice local geometric properties 
and phenomena as created in Example [9] cannot occur. In light of this fact, our algebraic 
exponential families appear to be in particular a good framework for the study of hidden 
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variable models, which are widely used models whose statistical properties have yet to be 
understood in entirety. 
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